Search Result

Journals

Publication Years

Keywords

Please wait a minute...

For Selected:

Download Citations
EndNote Ris BibTeX

Toggle Thumbnails

Select

Chinese speech segmentation method based on Gauss distribution of time spans of syllables

ZHANG Yang, ZHAO Xiaoqun, WANG Digang

Journal of Computer Applications 2016, 36 (5): 1410-1414. DOI: 10.11772/j.issn.1001-9081.2016.05.1410

Abstract （676）

PDF （957KB）（349）

Save

So far away, there is no accurate method for Chinese natural speech segmentation of syllables,which is meaningful in labeling speech with reference text instead of people. According to two hypotheses that time spans of Chinese syllables under the same pronunciation obey Gauss distribution and short-time energy valley exists between two adjacent syllables, Chinese speech segmentation method based on Gauss distribution of time spans of syllables was proposed. A simplified method based on distribution of energy valleys was given, which effectively reduced the time complexity of this speech segmentation method. The experimental results show that segmentation accuracy (mean square value of time spans between artificial labels and labels created by this method) achieve 10 ^-3 and computing times are less than 1 s in Matlab of PC.

Reference | Related Articles | Metrics

Select

Chinese speech segmentation into syllables based on energies in different times and frequencies

ZHANG Yang, ZHAO Xiaoqun, WANG Digang

Journal of Computer Applications 2016, 36 (11): 3222-3228. DOI: 10.11772/j.issn.1001-9081.2016.11.3222

Abstract （609）

PDF （1015KB）（478）

Save

Precise speech segmentation methods, which can also greatly improve the efficiency of corpus annotation works, are helpful in comparing voice with voice models in speech recognition. A new Chinese speech segmentation into syllables based on the feature of time-frequency-dimensional energy was proposed:firstly, silence frames were searched in traditional way; secondly, unvoiced frames were sought using the difference of energies in different frequencies; thirdly, the voiced frames and speech frames were looked for with the help of 0-1 energies in special frequency ranges; finally, syllable positions were given depending on the judgements above. The experimental results show that the proposed method whose syllable error is 0.0297 s and syllable deviation is 7.93% is superior to Merging-Based Syllable Detection Automaton (MBSDA) and method of Gauss fitting.

Reference | Related Articles | Metrics